🎮 Global Video Game Sales - Data Analysis¶
📌 Project Overview¶
This project analyzes global video game sales using a dataset from Kaggle. The aim of this project is to explore things like most popular genres, title, consoles and more.
📊 Dataset Details¶
- Source: Kaggle (Link : https://www.kaggle.com/datasets/asaniczka/video-game-sales-2024)
🔄 Data Loading & Preprocessing¶
Before diving into analysis, let us load the dataset and inspect its structure.
In [1]:
# Loading the dataset
import pandas as pd
video_games = pd.read_csv("/Users/venkat/Downloads/Global Video Game Sales/Video+Game+Sales/vgchartz-2024.csv")
# Checking the dataset
video_games.head()
Out[1]:
| img | title | console | genre | publisher | developer | critic_score | total_sales | na_sales | jp_sales | pal_sales | other_sales | release_date | last_update | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | /games/boxart/full_6510540AmericaFrontccc.jpg | Grand Theft Auto V | PS3 | Action | Rockstar Games | Rockstar North | 9.4 | 20.32 | 6.37 | 0.99 | 9.85 | 3.12 | 2013-09-17 | NaN |
| 1 | /games/boxart/full_5563178AmericaFrontccc.jpg | Grand Theft Auto V | PS4 | Action | Rockstar Games | Rockstar North | 9.7 | 19.39 | 6.06 | 0.60 | 9.71 | 3.02 | 2014-11-18 | 2018-01-03 |
| 2 | /games/boxart/827563ccc.jpg | Grand Theft Auto: Vice City | PS2 | Action | Rockstar Games | Rockstar North | 9.6 | 16.15 | 8.41 | 0.47 | 5.49 | 1.78 | 2002-10-28 | NaN |
| 3 | /games/boxart/full_9218923AmericaFrontccc.jpg | Grand Theft Auto V | X360 | Action | Rockstar Games | Rockstar North | NaN | 15.86 | 9.06 | 0.06 | 5.33 | 1.42 | 2013-09-17 | NaN |
| 4 | /games/boxart/full_4990510AmericaFrontccc.jpg | Call of Duty: Black Ops 3 | PS4 | Shooter | Activision | Treyarch | 8.1 | 15.09 | 6.18 | 0.41 | 6.05 | 2.44 | 2015-11-06 | 2018-01-14 |
In [2]:
# Exploring the dataset
video_games.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 64016 entries, 0 to 64015 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 img 64016 non-null object 1 title 64016 non-null object 2 console 64016 non-null object 3 genre 64016 non-null object 4 publisher 64016 non-null object 5 developer 63999 non-null object 6 critic_score 6678 non-null float64 7 total_sales 18922 non-null float64 8 na_sales 12637 non-null float64 9 jp_sales 6726 non-null float64 10 pal_sales 12824 non-null float64 11 other_sales 15128 non-null float64 12 release_date 56965 non-null object 13 last_update 17879 non-null object dtypes: float64(6), object(8) memory usage: 6.8+ MB
🔧 Data Cleaning¶
We handle missing and inconsistent data, rename columns for the sake of visualization and handle any incorrect data types
In [3]:
# Check for null_values
video_games.isnull().sum()
Out[3]:
img 0 title 0 console 0 genre 0 publisher 0 developer 17 critic_score 57338 total_sales 45094 na_sales 51379 jp_sales 57290 pal_sales 51192 other_sales 48888 release_date 7051 last_update 46137 dtype: int64
In [4]:
# Converting release_date column into datetime format
video_games['release_date'] = pd.to_datetime(video_games['release_date'],errors='coerce')
video_games.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 64016 entries, 0 to 64015 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 img 64016 non-null object 1 title 64016 non-null object 2 console 64016 non-null object 3 genre 64016 non-null object 4 publisher 64016 non-null object 5 developer 63999 non-null object 6 critic_score 6678 non-null float64 7 total_sales 18922 non-null float64 8 na_sales 12637 non-null float64 9 jp_sales 6726 non-null float64 10 pal_sales 12824 non-null float64 11 other_sales 15128 non-null float64 12 release_date 56965 non-null datetime64[ns] 13 last_update 17879 non-null object dtypes: datetime64[ns](1), float64(6), object(7) memory usage: 6.8+ MB
In [5]:
#Renaming columns for better understanding
video_games.rename(columns={'title':'Title','console':'Console','genre':'Genre','publisher':'Publisher','developer':'Developer'},inplace=True)
video_games.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 64016 entries, 0 to 64015 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 img 64016 non-null object 1 Title 64016 non-null object 2 Console 64016 non-null object 3 Genre 64016 non-null object 4 Publisher 64016 non-null object 5 Developer 63999 non-null object 6 critic_score 6678 non-null float64 7 total_sales 18922 non-null float64 8 na_sales 12637 non-null float64 9 jp_sales 6726 non-null float64 10 pal_sales 12824 non-null float64 11 other_sales 15128 non-null float64 12 release_date 56965 non-null datetime64[ns] 13 last_update 17879 non-null object dtypes: datetime64[ns](1), float64(6), object(7) memory usage: 6.8+ MB
In [6]:
#Extracting release_year from the release_date column
video_games = video_games.assign(release_year = lambda x:x['release_date'].dt.year)
video_games.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 64016 entries, 0 to 64015 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 img 64016 non-null object 1 Title 64016 non-null object 2 Console 64016 non-null object 3 Genre 64016 non-null object 4 Publisher 64016 non-null object 5 Developer 63999 non-null object 6 critic_score 6678 non-null float64 7 total_sales 18922 non-null float64 8 na_sales 12637 non-null float64 9 jp_sales 6726 non-null float64 10 pal_sales 12824 non-null float64 11 other_sales 15128 non-null float64 12 release_date 56965 non-null datetime64[ns] 13 last_update 17879 non-null object 14 release_year 56965 non-null float64 dtypes: datetime64[ns](1), float64(7), object(7) memory usage: 7.3+ MB
In [7]:
video_games.head()
Out[7]:
| img | Title | Console | Genre | Publisher | Developer | critic_score | total_sales | na_sales | jp_sales | pal_sales | other_sales | release_date | last_update | release_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | /games/boxart/full_6510540AmericaFrontccc.jpg | Grand Theft Auto V | PS3 | Action | Rockstar Games | Rockstar North | 9.4 | 20.32 | 6.37 | 0.99 | 9.85 | 3.12 | 2013-09-17 | NaN | 2013.0 |
| 1 | /games/boxart/full_5563178AmericaFrontccc.jpg | Grand Theft Auto V | PS4 | Action | Rockstar Games | Rockstar North | 9.7 | 19.39 | 6.06 | 0.60 | 9.71 | 3.02 | 2014-11-18 | 2018-01-03 | 2014.0 |
| 2 | /games/boxart/827563ccc.jpg | Grand Theft Auto: Vice City | PS2 | Action | Rockstar Games | Rockstar North | 9.6 | 16.15 | 8.41 | 0.47 | 5.49 | 1.78 | 2002-10-28 | NaN | 2002.0 |
| 3 | /games/boxart/full_9218923AmericaFrontccc.jpg | Grand Theft Auto V | X360 | Action | Rockstar Games | Rockstar North | NaN | 15.86 | 9.06 | 0.06 | 5.33 | 1.42 | 2013-09-17 | NaN | 2013.0 |
| 4 | /games/boxart/full_4990510AmericaFrontccc.jpg | Call of Duty: Black Ops 3 | PS4 | Shooter | Activision | Treyarch | 8.1 | 15.09 | 6.18 | 0.41 | 6.05 | 2.44 | 2015-11-06 | 2018-01-14 | 2015.0 |
In [8]:
video_games.describe()
Out[8]:
| critic_score | total_sales | na_sales | jp_sales | pal_sales | other_sales | release_year | |
|---|---|---|---|---|---|---|---|
| count | 6678.000000 | 18922.000000 | 12637.000000 | 6726.000000 | 12824.000000 | 15128.000000 | 56965.000000 |
| mean | 7.220440 | 0.349113 | 0.264740 | 0.102281 | 0.149472 | 0.043041 | 2006.359572 |
| std | 1.457066 | 0.807462 | 0.494787 | 0.168811 | 0.392653 | 0.126643 | 8.617813 |
| min | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1971.000000 |
| 25% | 6.400000 | 0.030000 | 0.050000 | 0.020000 | 0.010000 | 0.000000 | 2001.000000 |
| 50% | 7.500000 | 0.120000 | 0.120000 | 0.040000 | 0.040000 | 0.010000 | 2008.000000 |
| 75% | 8.300000 | 0.340000 | 0.280000 | 0.120000 | 0.140000 | 0.030000 | 2012.000000 |
| max | 10.000000 | 20.320000 | 9.760000 | 2.130000 | 9.850000 | 3.120000 | 2024.000000 |
Exploratory Data Analysis¶
📈 Sales Trends Over the Years¶
Analyzing yearly trends helps us understand how the gaming industry has evolved.
In [9]:
#This dataframe calculates sum of total_sales by release_year
annual_sales = video_games.groupby('release_year',as_index=False).agg({'total_sales':'sum'})
annual_sales.head()
Out[9]:
| release_year | total_sales | |
|---|---|---|
| 0 | 1971.0 | 0.00 |
| 1 | 1973.0 | 0.00 |
| 2 | 1975.0 | 0.00 |
| 3 | 1977.0 | 2.50 |
| 4 | 1978.0 | 2.36 |
In [10]:
# Creating a line chart here to plot total sales by year
import plotly.express as px
px.line(annual_sales,x='release_year',y='total_sales')
Top 10 titles¶
In [11]:
#This dataframe contains the ten highest selling titles by total_sales ranked from highest to lowest
top10_titles = (
video_games.groupby("Title",as_index=False).
agg({'total_sales':'sum'}).
sort_values('total_sales',ascending=False).
iloc[:10]
)
top10_titles.head(10)
Out[11]:
| Title | total_sales | |
|---|---|---|
| 13724 | Grand Theft Auto V | 64.29 |
| 5266 | Call of Duty: Black Ops | 30.99 |
| 5281 | Call of Duty: Modern Warfare 3 | 30.71 |
| 5273 | Call of Duty: Black Ops II | 29.59 |
| 5277 | Call of Duty: Ghosts | 28.80 |
| 5271 | Call of Duty: Black Ops 3 | 26.72 |
| 5280 | Call of Duty: Modern Warfare 2 | 25.02 |
| 20998 | Minecraft | 24.01 |
| 13719 | Grand Theft Auto IV | 22.53 |
| 5265 | Call of Duty: Advanced Warfare | 21.78 |
In [12]:
#Creating a bar chart of the top ten selling titles of all time
px.bar(top10_titles,x='Title',y='total_sales')
Building a Dashboard¶
In [ ]:
pip install jinja2==3.0.3
In [13]:
# Import necessary libraries
from dash import Dash, dcc, html
from dash.dependencies import Output, Input
from dash_bootstrap_templates import load_figure_template
import dash_bootstrap_components as dbc
import plotly.express as px
import pandas as pd
import numpy as np
# Load dataset, parse release_date as datetime, rename columns for clarity, and add a release_year column
video_games = (pd.read_csv("/Users/venkat/Downloads/Global Video Game Sales/Video+Game+Sales/vgchartz-2024.csv",parse_dates=['release_date']).
rename({'title':'Title','console':'Console','genre':'Genre','publisher':'Publisher',
'developer':'Developer'},axis=1).
assign(release_year = lambda x:x['release_date'].dt.year)
)
# Initializing the Dash app
app = Dash(__name__)
# Defining the layout of the Dash app
app.layout = dbc.Container([
html.H1("Video game explorer",style={'text-align':'center'}),
dbc.Row([
dbc.Col([
dbc.Card([
dcc.Markdown("**Select A Category**"),
dcc.Dropdown(
id="category-dropdown",
options=["Title","Genre","Publisher","Developer","Console"],value="Title",className="dbc"),
])
]),
dbc.Col([
dbc.Card([
dcc.Markdown("**Select A Region**"),
dcc.RadioItems(
id="region-radio",
options={
'total_sales':'World Total',
'na_sales':'North America',
'jp_sales':"Japan",
'pal_sales':'Europe/Africa',
'other_sales':'Rest of World'}, value='total_sales',className="dbc"),
])
]),]),
html.Br(),
dbc.Row(dcc.Graph(id="sales-line")),
html.Br(),
dbc.Row(dcc.Graph(id="rankings-bar")),
])
# Defining the callback function to update both graphs based on user selections
@app.callback(
Output('sales-line','figure'),
Output('rankings-bar','figure'),
Input('category-dropdown','value'),
Input('region-radio','value'),
)
# The two charts created above have been added to the dashboard which will allow us to select title, genre,
# publisher, developer and console with a dropdown and total_sales, jp_sales, na_sales, pal_sales
# and other_sales with radio buttons
def vg_plotter(category, region):
annual_sales=video_games.groupby("release_year",as_index=False).agg({region:"sum"})
fig = px.line(
annual_sales,
x='release_year',
y=region,
title=f"Video Game Sales in {region} Over Time"
).update_layout(title_x=0.5)
top10_sellers = (
video_games
.groupby(category, as_index=False)
.agg({region:"sum"})
.sort_values(region, ascending=False)
.iloc[:10]
)
fig2 = px.bar(
top10_sellers,
x=category,
y=region,
title=f"Top Video Game Sales by Category"
).update_layout(title_x=0.5)
return fig, fig2
#Run the dash app
if __name__ == "__main__":
app.run_server()
🏆 Key Insights & Takeaways¶
- The Sports and Action genres dominate total global sales.
- Video game sales peaked in the year 2008, with a decline afterward.
- EA Canada tops the list for Video Game sales when categorised based on Developer
- PS2 console dominates the industry.